GI Forum, Nov 10, 2020

What is spherical geometry?

Earth-related coordinates of spatial data can be either

  • projected: measured on a two-dimensional surface, which is associated with a projection from the Earth’s surface
  • geocentric: x-, y-, and z-coordinates associated with the three principal directions, measured from Earth’s center
  • geographic: degrees latitude and longitude, associated with a datum (ellipsoidal or even spherical model of the Earth)

Geometrical computations on projected or geocentric coordinates can be done using Euclidean geometry, where all lines are staight.

Spherical (or ellipsoidal) geometry computes distances, direction, areas etc. over the surface of the sphere (ellipsoid).

Why is this worth talking about?

  • I believe that applying Eucledian geometry to geographical coordinates is the number one most common error made in spatial data science (closely followed by ignoring the support of data).

  • technological advances have made spherical geometry a good option, but inertia in legacy GIS and data science languages have slowed down its adoption / uptake

  • We have gotten used to Plate Carree, and are OK with it.

  • Even GeoJSON has written down that the world is 2D.

What is (so bad about) Plate Carrée?

Compared to other projections:

  • every global projection is miserable in its own way
  • it preserves shape (1 unit easting equals 1 unit Northing) only at the equator
  • it is hopeless at the poles and at the antimeridian

Compared to an ellipsoid, or sphere:

  • It is 2D. The world is round, bro.

Consider, for a moment, the difference between:

  • a flat projection of the sphere, and a sphere (1, 2)
  • a sphere and an ellipsoid with 1/300 flattening (2, 3)

drawing drawing drawing

Equirectangular projection: where meridians and parallels form equal rectangles (left; shape preserving at center), not squares (right: nowhere true)

drawing

Prior work

  • GeographicLib (C.F.F. Karney), “[…] a small set of C++ classes for […] solving geodesic problems.
  • PostGIS (liblwgeom)’s “geograpy” type
    • funded by palantir
    • feature list,
    • but: “The buffer and intersection functions are actually wrappers on top of a cast to geometry, and are not carried out natively in spherical coordinates. As a result, they may fail to return correct results for objects with very large extents that cannot be cleanly converted to a planar representation.

  • Google’s s2geometry
  • H3: Uber’s Hexagonal Hierarchical Spatial Index

  • R package geosphere providing distance, bearing, etc, and several alternative measures
  • R packages sp, gstat, spdep using geodetic distances in case of geographic coordinates
  • R package lwgeom interfacing parts of liblwgeom

The situation with sf and stars

st_as_sfc("POLYGON((0 0,1 0,1 1,0 1,0 0))") %>% 
    st_area()
## [1] 1
st_as_sfc("POLYGON((0 0,1 0,1 1,0 1,0 0))", crs = "EPSG:4326") %>% 
    st_area() %>%
    set_units(km^2)
## 12308.78 [km^2]

\(\Rightarrow\) measures are computed over the sphere/ellipsoid

a <- st_as_sfc("POINT(7 52)", crs = "EPSG:4326") %>% 
    st_buffer(set_units(1, degree)) %>%
    st_transform("EPSG:25832") # UTM zone 32 N
## Warning in st_buffer.sfc(., set_units(1, degree)): st_buffer does not correctly
## buffer longitude/latitude data
b <- st_as_sfc("POINT(7 52)", crs = "EPSG:4326") %>% 
    st_transform("EPSG:25832") %>% # UTM zone 32 N
    st_buffer(set_units(111, km))

\(\Rightarrow\) wrong computations raise a warning

\(\Rightarrow\) proper buffering needs round-trip through a projection

st_as_sfc("POINT(0 89)", crs = "EPSG:4326") %>% 
    st_buffer(set_units(5, degree)) %>%
    st_bbox()
## Warning in st_buffer.sfc(., set_units(5, degree)): st_buffer does not correctly
## buffer longitude/latitude data
## xmin ymin xmax ymax 
##   -5   84    5   94

\(\Rightarrow\) nonsense coordinates may come out

How is spherical geometry different?

  • in contrast to the plane \(R^2\), the sphere (\(S^2\)) is bounded
  • where in \(R^2\) polygons have an unambiguous inside, on \(S^2\) polygons divide the sphere’s surface in two parts.
  • the inside can then e.g. be associated with ring direction (e.g. left of the boundary, when traversing the ring, is inside), and flips if the ring directions are reversed
  • the empty polygon has a complement: the full polygon.
  • two new bounding structures:
    • bounding cap: smalles circumfering circle (center point + radius)
    • bounding rectangle: lat/lng range that might cross the antimeridian

Plotting virtual globes

  • Google Earth
  • Google Maps: now has “enable/disable globe view” switches between azimuthal perspective and Web Mercator.
  • Cesium (can switch between Web Mercator, perspective and orthographic)

Analysing data with geographic coordinates

Measures

distance, area, direction, …, bounding box, bounding cap

Predicates

intersects, overlaps, covers, touches, is_within_distance, …

Transformations

intersection, union, difference, sym_difference, buffer

How do we do this? Welcome s2geometry

  • an open source library written and supported by Google, powering
    • Google Maps, Earth, Earth Engine
    • Google’s serverless SQL engine bigquery GIS
  • provides all the measures, predicates, transformations
  • allows several snapping to grid options, preserving validity
  • provides, and uses, for spatial index a space-filling hilbert curve, after projecting a sphere onto six cube-faces of an earth-cube

Examples

sf_use_s2(TRUE)
a <- st_as_sfc("POINT(7 52)", crs = "EPSG:4326") %>% 
    st_buffer(set_units(1, degree)) %>%
    st_transform("EPSG:25832") # UTM zone 32 N
b <- st_as_sfc("POINT(7 52)", crs = "EPSG:4326") %>% 
    st_buffer(set_units(1, degree), max_cells = 300) %>%
    st_transform("EPSG:25832") # UTM zone 32 N
c <- st_as_sfc("POINT(7 52)", crs = "EPSG:4326") %>% 
    st_buffer(set_units(1, degree), max_cells = 100) %>%
    st_transform("EPSG:25832") # UTM zone 32 N

\(\Rightarrow\) no longer raise a warning

Migrating sf to default to s2: challenges

  • \(\Rightarrow\) for complex shapes, what is a good default value for max_cells?
  • \(\Rightarrow\) when are buffers needed, when is st_is_within_distance enough?
  • st_is_within_distance in s2 is fast (indexed), but slow in lwgeom

Outlook

  • run reverse dependency checks on all (300) reverse dependencies
  • document best practices (st_is_within_distance rather than st_buffer)
  • explain the issue: the world is round.